(19) 



J 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



(45) Date of publication and mention 
of the grant of the patent: 
13.08.2003 Bulletin 2003/33 

(21) Application number: 98117083.0 

(22) Date of filing: 09.09.1998 



(11) EP 0 985 995 B1 

EUROPEAN PATENT SPECIFICATION 

(51) IntClJ: G06F 1/00 



(54) Method and apparatus for intrusion detection in computers and computer networks 

Verfahren und Vorrichtung zur Eindringdetektion In Rechnem und Rechnernetzen 

Proc§d6 et appareil de detection d'intrusion dans des ordinateurs et des r^seaux d'ordinateurs 



(84) Designated Contracting States: 
DE FR GB 

(43) Date of publication of application: 
1 5.03.2000 Bulletin 2000/1 1 

(73) Proprietor: International Business lUlachines 
Corporation 

Armonk, NY 10504 (US) 

(72) Inventors: 

• DACIER, MarcC. 
8134 Adiiswil (CH) 

• DEBAR, Herv^ C. 
8134 Adiiswil (CH) 

• WESPI, Andreas A. 
8155 Nettmenhasli (CH) 

• FLORATOS, Aris 

Long Island City. New York 11106 (US) 

• RiGOUTSOSJsidore 
Astoria, New York 11103 (US) 

(74) Representative: Williams, Julian David 
International Business Machines Corporation 
Saumerstrasse 4/Postfach 

8803 Rusciiiikon (CH) 



CD 

in 

O) 

o 
If) 

00 
O) 

o 

LLI 



(56) References cited: 
US-A- 5 621 889 

• DENAULT S ET AL: "INTRUSION DETECTION: 
APPROACH AND PERFORMANCE ISSUES OF 
THE SECURENET SYSTEM" COMPUTERS & 
SECURITY INTERNATIONAL JOURNAL 
DEVOTED TO THE STUDY OF TECHNICAL AND 
FINANCIAL ASPECTS OF COMPUTER 
SECURITY, vol. 13, no. 6, 1 January 1994, pages 
495-508, XP000478665 

• I.RIGOUTSOS ET AL: "Combinatorial Pattern 
Discovery in Biological Sequences: The 
TEIRESIAS Algorithm" BIOtNFORMATiCS, voL 
14, no. 1, 1998. pages 55-67. XP002094907 

• COOK J E ET AL: "AUTOMATING PROCESS 
DISCOVERY THROUGH EVENT-DATA 
ANALYSIS" PROCEEDINGS OF THE 17TH. 
ANNUAL CONFERENCE ON SOFTWARE 
ENGINEERING, SEATTLE. APR. 23 - 30. 1 995, no. 
CONF. 17. 23 April 1995, pages 73-82, 
XP000545655 ASSOCIATION FOR COMPUTING 
MACHINERY 



Note: Within nine months from the publication of the mention of the grant of the European patent, any person may give 
notice to the European Patent Office of opposition to the European patent granted. Notice of opposition shall be filed in 
a written reasoned statement. It shall not be deemed to have been filed until the opposition fee has been paid. (Art. 
99(1) European Patent Convention). 



Printed try Jouve, 75001 PAFUS (FFQ 



1 



EP 0 985 995 B1 



2 



Description 
Technical Field 

[0001] This invention relates to intrusion detection, i. 
e. the detection of security problems in a computer net- 
work or on any computer within said network. It is par- 
ticularly suited to detect outsiders trying to break into a 
computer system (e.g. via the net) and/or to detect in- 
siders misusing the privileges they have received (e.g. 
someone internal reading confidential data that he/she 
is not entitled to). In brief, the invention uses a behavior- 
based approach for a pattem-oriented intrusion detec- 
tion system. 

Background of the Invention 

[0002] Generally, an intmsion detection system dy- 
namically monitors actions that are taken in a given en- 
vironment and decides whether these actions are symp- 
tomatic of an attack or constitute a legitimate use of the 
environment. 

[0003] Essentially, two main intrusion detection meth- 
ods are known. The first method uses the knowledge 
accumulated about attacks and looks for evidence of 
their exploitation. This method is referred to as knowl- 
edge-based. The second method builds a reference 
model of the usual behavior of the system being moni- 
tored and looks for deviations from the obsen/ed usage. 
This method is referred to as behavior-based, 
[0004] In the knowledge-based approach, the under- 
lying assumption is that the system knows all possible 
attacks. There is some kind of a signature for each at- 
tack and the intrusion detection system searches for 
these signatures when monitoring the traffic. E.g.. one 
may monitor the audit trails on a given machine, the 
packets going onto the net. etc. This first approach is 
addressed and described by Shieh et al. in US patent 5 
278 901 , which also gives a good overview over the 
technology. An advantage of this method is that no or 
only few false alamns are generated, i.e. the false alami 
rate is low; the main disadvantage is that only those at- 
tacks can be located that are already known. Any newly 
developed intrusion attack would usually remain unde- 
tected since its signature is still unknown and thus the 
system does not search for it. 
[0005] Unfortunately, there are nowadays so many at- 
tacks that the set of signatures is growing very fast. Also, 
some signatures are difficult to express and an algo- 
rithm to search for them can be rather time-consuming. 
Nevertheless, this approach has proven its usefulness 
and there are products using this approach available on 
the market: NetRanger by Cisco Systems, Inc., and Re- 
alSecure by Internet Security Systems. Inc., are two ex- 
amples of such available products. 
[0006] The second, the behavior-based, approach 
starts from the assumption that if an attack is carried out 
against a system, its "behavior" will change. Therefore, 



the approach is to define a kind of normal profile of a 
system and watch for any deviation from this defined 
normal profile. Different techniques can be applied (e. 
g. statistics, rule-based systems, neural networks) using 

5 different targets (e.g. the users of the system, the per- 
formances of the network, the CPU cycles, etc...). The 
main advantage of this method over the knowledge- 
based one is that the attacks do not need to be known 
In advance, i.e. that unknown attacks can be detected. 

10 Thus, the detection remains up-to-date without having 
to update some database of known signatures. But 
there are disadvantages: deviations can occur without 
any attack (e.g. changes in the activity of the user, new 
software installed, new machines, new users, etc.). 

15 Therefore, all known efforts in this direction have been 
facing a rather high rate of false alarms. There appears 
to be only one product on the market using this ap- 
proach: CMDS by Science Applications International 
Corporation. 

20 [0007] In "A Sense of Self for Unix Processes" by S. 
Forrest et at.. Proceedings of the 1996 IEEE Symposi- 
um on Security and Privacy, pp. 1 20-1 28. Oakland. Cal- 
ifomia. May 1996, it is described how to model the be- 
havior of the "sendmail daemon', i.e. a program running 

25 permanently in the background without user interaction, 
using the sequences of system calls that this program 
generates while running. The idea is to build a table of 
all the sequences of a given fixed length (here 5, 6, and 
11) of consecutive system calls that could be found 

30 when watching such a sendmail daemon running. The 
claim was that if one tries to take advantage of a vulner- 
ability in the sendma// code, then this would generate a 
sequence of systems calls not found in a "nonnar table, 
i.e. a table generated from a sample with nomrial behav- 

35 ior. However, when experimenting with this approach, 
one discovers that the table necessary can become fair- 
ly large. It must be stressed that all the sequences of 
system calls In this table have the same length, i.e. 
lengths of 5. 6, and 1 1 . It has been shown in "Fixed vs. 

40 Variable-Length Patterns for Detecting Suspicious 
Process Behavior" by H. Debar, M. Dacier, M. Nassehi, 
and A. Wespi, Proceedings of ESORICS 98. Louvain- 
la-Neuve, Belgium, September 1998, that when trying 
to find what the best length for the sequences is ("best" 

45 meaning producing the shortest table of patterns while 
covering all possible sequences) the result is that the 
"best" length is 1 . This means that the system does not 
search for unseen sequences but for unseen system 
calls, The consequence is that if an attack does not use 

50 any unseen system call it will not be detected. This is 
generally unacceptable since it may be possible to run 
an attack without using a previously unseen system call. 
[0008] There are two classes of information sources 
for intrusion detection systems as described in "Towards 

55 a Taxonomy of Intrusion Detection Systems" by H. De- 
bar, M. Dacier, and A. Wespi, IBM Research Report 
3030. June 1 998. Based on the location from where the 
infomiation can be retrieved, it is differentiated between 
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host-based and network-based intrusion detection sys- 
tems. Examples of host-based information sources are 
the so-called C2 audit trails, the systog files known in 
the UNIX operating system, or the event logs in Win- 
dows NT. Network-based information is mainly retrieved 
by analyzing the network packets. 
[0009] As will be described in detail further below, the 
present invention relates to behavior-based intrusion 
detection using host-based information sources. For a 
given process, the intrusion detection system decides 
whether the process behavior can be judged as nomnal 
or abnormal. Abnormal behavior is an indication of an 
intrusion. 

[001 0] As mentioned above, Forrest et al. describe in 
"A Sense of Self for UNIX Processes", Proceedings of 
the 1996 IEEE Symposium on Security and Privacy, pp. 
120 - 128, Oakland, California. May 1996, a process 
model using a set of fixed-length pattems. These pat- 
terns correspond to all the possible pattems that can be 
found in the event sequences recorded during the train- 
ing phase. 

[0011] This poses a problem since a careful look at 
the sequences of audit events that can be generated by 
the so-called ftp daemon running under AIX shows that 
there are very long subsequences which repeat fre- 
quently. For example, many process instantiations start 
with an identical subsequence that has a length of 40 
audit events. Thus, since the described fixed-length ap- 
proach does not consider such a characteristic, any re- 
sult of an intrusion detection method based on such a 
fixed-length approach is distorted and certain intrusions 
and/or misuses cannot be detected. 
[001 2] In "Intrusion Detection via System Call Traces'* 
by A.P. Kosoresow and S.A. Hofmeyr, IEEE Software, 
pp. 35 - 42. SepyOct. 1997, it is shown that variable- 
length pattems can be used to model the normal behav- 
ior of a process. However, the pattems presented in this 
publication were constructed manually due to the lack 
of an automated method. It is obvious that such a man- 
ual selection or design of the pattems is inadequate for 
an automatic intrusion detection of the kind here ap- 
proached. 

[001 3] A significantly different approach for a pattern- 
oriented intrusion detection system is disclosed in US 
Patent 5 278 901 to Shieh et al. It shows an intrusion 
detection system based on object privilege and informa- 
tion flow, i.e. does not use the deviation from a "typical" 
activity profile as described above. The approach by 
Shieh at al. is a knowledge-based intrusion detection 
system which is in contrast to the behavior-based ap- 
proach of the present invention. Furthermore, the Shieh 
patent covers mainly the problem of detecting violations 
against previously defined access control policies while 
the present invention aims at detecting any type of at- 
tacks. The complexity of the solution chosen in the 
Shieh patent, however, makes this approach unsuitable 
to solve the problems which the present invention ad- 
dresses. 



[001 4] To summarize, it is an object of the present in- 
vention to provide a simple and reliable method and ap- 
paratus for the detection of intrusions into a computer 
system, based on event patterns and particularly direct- 

5 ed to detect deviations from a "normal" process behav- 
ior, and thus to detect attacks performed against said 
process. A more specific object is to generate, prefera- 
bly automatically, so-to-speak "natural" pattems for the 
description of the process behavior and thus produce a 

10 very condensed resulting pattern table. Another specific 
object is to allow the use of highly efficient pattern 
matching algorithms, especially by producing a relative- 
ly small pattern matching table. A further specific object 
is to produce a pattern table with most representative 

15 patterns, independent of their length. A still further spe- 
cific object is to produce a pattem table with less but 
longer entries than tables obtained with known ap- 
proaches and thus improve the detection of attacks. A 
still further object is to define rules that specify when a 

20 deviation from the normal behavior is significant enough 
to raise an alamn. 

Summary of the Invention 

25 [0015] It appears that the idea of having sequences 
investigated is very important, but that building fixed 
length sequences is leading to unsatisfactory results, at 
least when these sequences do not exceed a certain 
minimal length. Therefore, the invention uses a new ap- 

30 proach by focusing on a novel algorithm, the Teiresias 
algorithm, as described by 1. Rigoutsos and A. Floratos 
in "Combinatorial Pattem Discovery in Biological Se- 
quences -The TEIRESIAS Algorithm" in Biolnfonnatics. 
pp. 55-67, Vol. 14, No. 1, 1998. This algorithm is also 

35 the subject of US patent 6 108 666 to Floratos et al. 
[001 6] The Teiresias algorithm, developed for a differ- 
ent purpose and never considered for intrusion detec- 
tion, is used to search for patterns, i.e. all the subse- 
quences that appear at least twice in a set of input se- 

40 quences. Though there are other algorithms besides 
Teiresias that solve the problem of discovering all pat- 
terns, none of them is as efficient and fast as the Teire- 
sias algorithm. Generally speaking, the results achieved 
with the Teiresias algorithm are far superior to anything 

45 produced by the prior art approaches. 

[001 7] A particular advantage is that, using the Teire- 
sias algorithm, the longest pattems in a set of input se- 
quences can be found. This is important since a table 
of long pattems appears to be more "representative" of 

50 a specific process than a table of short pattems. Since 
longer pattems usually contain more context informa- 
tion, it appears that they are more significant for a proc- 
ess than short pattems. On the other hand, short pat- 
terns are not necessarily unique for a specific process, 

55 but may appear in other processes. It is even possible 
that short patterns are part of an attack. The longer a 
pattem is, the lower is the probability that this pattem is 
part of other processes or even an attack. Consequent- 
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ly, it was found that there are attacks that can be detect- 
ed with the new technique according to the present in- 
vention, which attacks remained undetected with other 
techniques. 

[0018] A further advantage is that, obviously, a small 
pattem table allows to Implement efficient pattern 
matching algorithms and still works reasonably fast. 
Both advantages lead to an improved detection of at- 
tacks. 

[0019] To summarize, the present invention provides 
a method and a system for reliably detecting intrusion 
patterns, thereby minimizing the probability of false 
alanns. 

[0020] The method and apparatus for an intrusion de- 
tection system according to the Invention, using the de- 
scribed variable-length approach when investigating 
event pattems, operates in two modes, a training mode 
and an operation mode. 

[0021] In the training mode, generally speaking, the 
behavior of a process is defined based on the system 
events it generates. System events are either the sys- 
tem calls that are invoked by the process or the audit 
events generated on behalf of the process. The process 
model is a table of pattems, i.e. sequences or subse- 
quences of events, which are representative of the proc- 
ess examined. To get a complete picture of the process, 
it is important that as many different event sequences 
as possible are generated and analyzed. 
[0022] In this training mode, variable-length pattems 
are retrieved from the event sequences generated by or 
on behalf of the process. All events of a specific type 
generated from the invocation of the process until Its end 
constitute an event sequence. Different process invoca- 
tions may result in different event sequences. Pattems 
are subsequences of the event sequences; pattems that 
are characteristic for the process are stored in a pattern 
table. The pattern table represents the process model. 
[0023] In the operation mode of the present invention, 
it is decided whether the event streams created on be- 
half of the process can be matched by the pattems in 
the pattem table, which corresponds to a normal proc- 
ess behavior, or whether there are subsequences of un- 
matched events. Unmatched events represent a devia- 
tion from the normal behavior and may thus indicate an 
intrusion or misuse, called an attack. Significant devia- 
tions result in raising an alarm. 
[0024] As already mentioned, the present invention is 
advantageous because pattems are generated that are 
"natural" for the description of the process behavior. The 
use of variable-length pattems to build the set of repre- 
sentative pattems results in a pattem table with less, but 
longer entries than tables obtained with other approach- 
es. As explained, longer pattems contain more context 
information and are therefore more representative for a 
particular process than short pattems. Furthermore, a 
small pattem table increases the speed of the detection 
process. It is obvious that, when looking for a pattem 
that matches part of a given sequence, searching in a 



small set of pattems is faster than in a large set. There- 
fore, small pattern tables allow to speed up the pattem 
matching process. 

[0025] The invention is defined according to claims 1 
5 (method) and 10 (apparatus). 

Brief Description of the Drawings 

[0026] The foregoing and other features and advan- 
ce tages of the invention will be apparent from the following 
more detailed description of a preferred embodiment of 
the invention, as illustrated in the accompanying draw- 
ings, in which: 

IS Figure 1 shows the components of an intrusion de- 
tection system according to the invention, 
based on the analysis of event pattems; 

Figure 2 is a sample output of the Event Recording 
20 component; 

Figure 3 is a sample output of the Process Filtering 
component; 

25 Figure 4 is a sample output of the Translation com- 
ponent; 

Figure 5 is a sample output of the Reduction and Ag- 
gregation component; 

30 

Figure 6 shows the pattems as detected by the Teir- 
esias algorithm for a set of sample input 
strings; 

35 Figure 7 is an illustration of the pattern reduction al- 
gorithm applied to the sample string set in- 
troduced in Figure 6; 

Figure 8 is an illustration of the pattem matching al- 
40 gorithm for a case where the input string can 

be covered; and 

Figure 9 is an illustration of the pattern matching al- 
gorithm for a case where the input string 
45 cannot be completely covered. 

Detailed Description of the Invention 

[0027] In the following first section, the components 
50 of the intrusion detection system are described. In a sub- 
sequent second section, the algorithms used in selected 
components are discussed. 



55 



1. The Intrusion Detection System and its Components 

[0028] Figure 1 shows the components of the intru- 
sion detection system. The system consists of two parts: 
an off-line part and an on-line part. The off-line part rep- 
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resents the training phase or mode, and the on-line part 
the real operation or operation mode. In the training 
mode, a model of the normal behavior of the process 
examined is generated. In the operation mode, the in- 
stantiations of the process under the observation of the s 
intrusion detection system are compared to the process 
model and, if a significant deviation is observed, an 
alarm 136 is raised. 

[0029] A process execution 103 can trigger different 
types of events 104. Either one of the following two 
event sources can be used for the present invention: 

• C2 audit events as they are recorded by the auditing 
system available on most UNIX and some other op- 
erating systems. 

• System calls as they are recorded by programs 
coming with the operating system, e.g. strace, or by 
other system utilities. 

[0030] It has to be noted that the two sources cannot 20 
be used interchangeably. Either audit events or system 
calls have to be considered. Any further use of the term 
event'm this document may relate to audit events as well 
as system calls. 

[0031 ] In the off-line part, it is possible to influence the 
process invocation in order to exercise as many different 
process execution paths as possible. For this purpose, 
we use the functional verification tests (FVT) as they are 
used by software developers to test all the different sub- 
commands that can be executed by a process. 
[0032] Other approaches would be to define manually 
a set of subcommands that are expected to cover all the 
process execution paths, or to just record the events of 
the process running in a real environment. 
[0033] The events generated on behalf of a process 
are recorded by an event recording component 105. 
Event recording component 105 may not only record 
events by the process examined but also by other proc- 
esses in the system. E.g. the auditing system does only 
allow to collect the audit events on a system level, i.e. 
either for all processes or for none. An event is de- 
scribed by several attributes, e.g. the process name, the 
event name, the process id, the parent process id. or 
the user id. 

[0034] Event recording component 105 fonwards the 
events to training system 102. Events are fonwarded as 
triples comprising process name, event name, and proc- 
ess id, labeled 106. Figure 2 shows a sample of the 
events that are sent from event recording component 
105 to a filtering component 107 In training system 102. 
Filtering component 1 07 first groups together the events 
belonging to the same process by keeping the chrono- 
logical order of the events. All events belonging to the 
same process are called an event sequence. An event 
sequence consists of an unique identifier and a list of 
events. The events are given as tuples comprising the 
process and the event name. Not all event sequences 
are needed for the further processing. Only those se- 



quences which are needed to analyze the behavior of 
the examined process are fonwarded to a translation 
component 109. 

[0035] Figure 3 shows the events of Figure 2 after ap- 
plying the filtering component 107. 
[0036] For the further internal processing, translation 
component 109 translates the event sequences into an 
internal data representation. Each event, i.e. the tuple 
consisting of the process and the event name, Is trans- 
lated into a single character of an alphabet 2. The output 
of translation component 109 are strings of characters 
labeled 110. 

[0037] The translation is bijective, i.e. identical events 
are translated into the same character, and a character 
is the translation of identical events only. The translation 
rules are generated on the fly and stored in a translation 
table 1 36. Figure 4 shows the result of the translation of 
the event sequences into strings. 
[0038] The strings can be reduced further. A reduction 
and aggregation component 111 performs two tasks: 

• Duplicate strings are removed. 

• Consecutive occurrences of the same character are 
aggregated into a smaller number of the same char- 
acter. 

[0039] The first task results in a set of unique strings. 
Duplicate strings do not add any value to the further 
processing as will be seen later. 
[0040] It can be observed that subsequences of N, N 
>>1, events are quite frequent, with A/ exhibiting small 
variations. An example is the ftp login session, where 
the ftp daemon closes several file handles inherited from 
the /new process. Since the /nefd process is not always 
in the same state, the number of its file handles may 
vary. As a consequence, the ftp daemon inherits not al- 
ways the same number of file handles. Closing all the 
unneeded file handles results therefore In a varying 
number of file close operations. 
[0041] There are two possible ways to aggregate 
characters: 

• The identical consecutive characters are replaced 
with an extra, not yet used character. 

• The N Identical consecutive characters are com- 
prised into M,1<sM <= N, characters. 

[0042] The first approach increases the number of 
unique events and possibly also the number of patterns. 
Since the number of patterns should be kept small, the 
second approach with M = 1 has been selected. The 
newly created strings have less semantics than the orig- 
inal ones, but no case is known where the character ag- 
gregation impacts the operation of the intrusion detec- 
tion system. 

[0043] The output of the reduction and aggregation 
component 111 are unique strings, labeled 112, where 
consecutive occurrences of the same character are re- 
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moved. Figure 5 shows the strings of Figure 4 after be- 
ing processed by reduction and aggregation component 
111. 

[0044] A pattern extraction component 113 deter- 
mines the patterns which constitute the process model 
and stores them in a pattern table 135. The algorithms 
used to build pattern table 1 35 are explained in detail in 
subsequent sections. 

[0045] Pattern table 1 35 is a key part of the intrusion 
detection system according to the invention. It links the 
off-line system 102 with the on-line system 122. 
[0046] The on-line part has nearly the same compo- 
nents as the off-line part. However, a main difference is 
that the process to be examined is not under control of 
the intrusion detection system. In on-line system 122. 
the event recording component 125 and the process fil- 
tering component 1 27 are the same as in the off-line sys- 
tem. The translation component 129 is different with re- 
spect to the fact that audit events are translated based 
on the entries retrieved from a translation table 134. If 
there is an event for which no entry in translation table 
1 29 exists, the event is translated to a dummy character. 
For each event sequence, it has to be decided whether 
there is a sign of an intrusion or not. Therefore, there is 
no reduction component (like component 111 in the off- 
line part) that removes duplicate sequences as in the 
training system. There Is only an aggregation compo- 
nent 131. 

[0047] The pattern matching component 1 33 receives 
its input strings 132 from the aggregation component 
131. By applying the algorithm described in the subse- 
quent section of this description, it is tried to match all 
the input strings with the patterns of pattern table 135. 
However, there may be strings that remain with uncov- 
ered characters. Depending on the number of consec- 
utively uncovered characters, it is decided whether 
there is an indication of an intrusion, and whether an 
alamn 136 has to be issued. 

[0048] For the ftp daemon, a threshold of 6 characters 
was selected, i.e. if there are 7 subsequent uncovered 
characters, an alarm is issued. 

2. Algorithms 

[0049] In this section, a sample algorithm to build the 
pattem table and a sample algorithm to cover the input 
stream are described. They represent the best algo- 
rithms we know so far. However, we can think of varia- 
tions of these algorithms. For example, the algorithm to 
build the pattem table sorts the patterns based on the 
number of characters they can cover at the beginning 
and end of an input string. We can think of other sort 
criteria like the total number of characters covered by a 
pattem or the number of occurrences of a pattern. 

2.1 Terminology and Notation 

[0050] Consider a finite set of characters £ = {c,, 



C2» Cfi- The set S is called an alphabet. To denote a 
string of n, n > 0, identical consecutive characters c G 
S. we write c". The term c* denotes a string of identical 
consecutive characters of arbitrary length /, />= 0. The 

5 term c+ denotes a string of identical consecutive char- 
acters of artjitrary length m, m>0. To denote an art\- 
trary string of length n, n> 0, we write {.}". {.} * denotes 
an ariDitrary string of arbitrary length /, / >= 0, and {.}+ 
denotes an artDitrary string of length m,m>0, 

10 [0051] The length of a string s is written as |s|. We 
write c G s if the character c is contained in the string s. 
[0052] Given is a set of strings S={SpS2, s J over 
the alphabet 2. A substring p that 

15 • occurs at least twice in the set of strings S, and 

• has a length |p| of two or more characters 

is called a pattem. 

[0053] p" denotes the pattern p repeated n, n > 0, 
20 times, p* denotes the pattem p repeated /, / >= a, times, 
and p* denotes the pattem p repeated m,m>0, times. 
[0054] A pattem p is maximal, if there is no pattem q 
for which holds: 

25 m p is a substring of q with |p| < |qr|, and 

• the number of occurrences of the pattem q e S is 
equal or larger than the number of occurrences of 
the pattem p e S. 

30 [0055] A character c e s is said to be covered by the 
pattem p. if c G p and p is a substring of s. 
[0056] A string s is said to be covered by a set of pat- 
terns P. if for each character c, c e s, there is a pattem 
p, p G P, so that c is covered by p. 

35 [0057] A set of strings S is said to be covered by a set 
of patterns P, if each string s,sgS, is covered by P. 
Additionally, P is said to cover S. 
[0058] Given are a pattern p and a string s. Let us 
decompose the string s as follows: 

40 

s = p^ {.}* p' Kr>H) 

[0059] It is assumed that the decomposition is maxi- 
45 mal, i.e. there are no /' and r' for which holds / Vr'>/-fr. 
[0060] The expression (/ + f)-|p|. i.e. the sum / + r 
times the pattem length |p|, is called margin cover of the 
pattem pand the string s. It is written as mCover(p,s), 
The margin cover of a pattem p and a string set S = {Sp 
50 $2, Sp}, written as mCover(p,S), is defined as 



mCover(p,Si), 

55 

[0061] The total cover of a pattem p and a string set 
S, tCover(p,S) is the total number of characters that can 
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be covered by the pattern p. 

2.2 Determining the Set of Maximal Variable-Length 
Patterns 

5 

[0062] In a first step, all maximal patterns contained 
in the set Sot input strings have to be determined. For 
this purpose, the Teiresias algorithm, as described by I. 
Rigoutsos and A. Floratos in "Combinatorial Pattern Dis- 
covery in Biological Sequences - The TEIRESIAS Algo- 
rithm" in Bio Informatics, pp. 55-67, Vol. 14. No. 1 , 1998. 
is used. A minimal pattern length m can be specified as 
argument for the Teiresias algorithm. The Teiresias al- 
gorithm will then only find the maximal pattems whose 
length is equal to or greater than this given minimal is 
length. 

[0063] Part a) of Figure 6 shows a sample input set of 
3 strings and part b) shows the corresponding pattem 
set as discovered by the Teiresias algorithm (with m = 
2). For each pattern, the total number of occurrences 20 
(first column) as well as the number of strings in which 
it occurs (second column) is given. 

2.3 Reducing the Set of Patterns 

25 

[0064] Out of the set of pattems P consisting of all the 
maximal pattems found for the string set S, a subset of 
pattems R, RaP, is selected which covers S. As an 
example, the following algorithm can be used to build 
the reduced pattern set R 

1 . Let m denote the minimal pattem length that was 
used to generate the set of maximal variable-length 
patterns. Add each s, s e S A |s| <2 • m, to P and 
remove it from S. 

2. If P= 0, then add all s G Sto the reduced pattem 
set R and exit. 

3. For each p G P calculate mCover(p,S). 

4. If there is a pattem p fulfilling mCover(p,S) > 0, 
then select a pattem r for which mCover(r,S) is ^0 
maximal, i.e. there is no pattem qfor which holds: 

mCover(q,S) > mCover(r,S) or 

45 

mCover(q,S) = mCover(r,S) A |q| > M- 

5. Add rto the reduced pattem set R and remove it 
from P, 

6. Remove first all the matching substrings adjacent 
to the beginning and end of a string, i.e. remove 
strings of the form s = r*, and replace strings of the 
form s = r*'s' or s- r*s" with s' or s", respectively. 

7. Remove then the matching substrings that are ss 
not adjacent to the beginning and end of a string, i. 

e. as long as there is a s 6 S, s = s'r^s" = {.y {.} 
V+{.} ♦{.}»*, m being the minimal pattem length and 



v,w>0, replace s with the two strings s'and s". v 
and w specify the minimal length of the resulting 
new strings s' and s", respectively. Setting them 
equal to m enforces that all patterns have a minimal 
length m. However, different settings are possible. 
8. If one of the strings s that have been newly added 
to S has a length \s\ < 2-m, m being the minimal 
pattem length, remove s from the set of strings $ 
and add it to the set of pattems P. 
8. If S^0, go to step 2. 

[0065] Figure 7 illustrates the pattern reduction algo- 
rithm applied to the sample string set introduced in Fig- 
ure 6. For each reduction step, the string set, the pattem 
set, and the reduced pattem set are shown. For each 
pattern in the pattem set, its mCover value is listed. The 
pattern with the highest mCover value is moved to the 
reduced pattern set, and matching substrings are re- 
moved from the string set. In this example, not all pat- 
terns are needed to cover all the strings. 
[0066] In the example of the ftp daemon, the Teiresias 
algorithm detemnines about 600 maximal variable- 
length patterns. After applying the reduction algorithm, 
about 50 pattems remain. 

2.4 Pattern Matching 

[0067] We describe a sample pattern matching algo- 
rithm. The algorithm tries to match the input stream by 
concatenating pattems, i.e. the pattems are placed one 
right after the other. A variation would be to allow over- 
lapping pattems. 

[0068] At certain points of the pattem matching proc- 
ess, there may be several pattems that match the input 
stream and it has to be decided which pattern to select. 
As an heuristic, a pattem is selected if a sequence of d, 
d > 0, patterns can be found that matches the input 
stream right after the pattem under consideration. 

1 . Set the counter of consecutively uncovered char- 
acters, u, to a. 

2. Wait until there are at least fr = diPmeanI charac- 
ters in the input stream where d is the parameter as 
explained in the introduction to this algorithm and 
\Pmean\ ^he mean length of all pattems p G P, or 
until the end of an input sequence has been 
reached. 

3. Find a pattem p G Pthat covers the beginning of 
the input stream T. If no pattem can be found, go to 
step 6. 

4. Find d> Opattems qp •••» «?d' so that the string 
t = pq^qz-'^d covers the beginning of the stream. 
If there are e pattems q^ fl^, q^O<e<d, that 
cover the whole input sequence, set t = pq^qs'-qe- 

(a) If t matches the whole input sequence, re- 
move the input sequence and go to step 1 . 

(b) If d pattems can be found covering the be- 
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ginning of the input stream, remove the pattern 
p from the input stream and go to step 1 . 

5. Determine all pattern combinations that cover the 
beginning of the input stream. Select the pattem s 
combination that covers the longest character se- 
quence, remove it from the Input stream, and go to 
step 1 . 

6. Skip one character and increase u by 1. 

7. \i u = n + 1, n being the threshold for the number io 
of consecutively uncovered characters, raise an 
alanm. 

8. Go to step 2. 

[0069] Figures 8 and 9 are two illustrations of the pat- is 
tern matching algorithm. One for the case of a fully-cov- 
ered string, one for the case of a partially-covered string. 
[0070] The pattem matching algorithm processes the 
pattem table found on top of Figures 8 and 9. In Figure 
8, the matching algorithm first finds the pattem "ABC" 20 
matching the beginning of the sample string. Before ac- 
cepting this pattern as a valid match, the pattem "ABC 
must be validated be finding cfc3 patterns matching the 
remainder "ABCDXYZGHI" of the string. Since such 
three patterns can be found, namely "ABCD", XY2", 25 
and "GHI", the pattem "ABC" is accepted and deleted 
from the sample string. Processing continues with the 
string "ABCDXYZGHI". 

[0071] In Figure 9, we have again the same pattem 
table as in Figure 8. However, the sample string to be so 
matched is somewhat different. Again, the pattem 
"ABC" is selected as candidate to match the beginning 
of the sample string. However, we cannot find d=3 pat- 
terns that match the remainder "ABCDKMWHF" of the 
string. This implies that another pattem that matches the 35 
beginning of the sample string should be tried. Since 
there is no such other pattern, the pattem sequence has 
to be found that matches the longest portion of the sam- 
ple string. Because the two pattem 
"ABC and "ABCD" match the longest subsequence, 
they are selected. The characters "KMW" cannot be 
matched and are skipped. Processing continues with 
the string "HF". 

[0072] A pattern-oriented, behavior-based, variable- 
length intrusion detection model was defined. The main 45 
advantage of this Inventive model is that it generates a 
kind of "natural" patterns or signatures of the process to 
be monitored, which patterns very well represent the 
"normal" behavior. Thus, deviations - which indicate in- 
trusion or misuse - can be easier detected than with pre- so 
viously known methods. 

[0073] While the present invention has been particu- 
larly shown with reference to one specific embodiment, 
it Is obvious to someone skilled In the art that It can be 
adapted to match the environment in which it is going to ss 
be used, whether for the detection of unauthorized 
transactions in the banking arena, of viruses in a com- 
puter network, of unauthorized entry to buildings with 



restricted access, or of unallowed data exchange be- 
tween data bases, to name a few. 



Claims 

1 . A method for detecting intrusion attempts In a com- 
puter or computer system, said method comprising 
in combination: 

in a training mode, building a table of charac- 
teristic, process-constituting patterns defining 
normal behavior of a model process in said 
computer or computer system by performing 
the following steps: 

• building a first event sequence by filtering 
a first event stream generated by said mod- 
el process, 

• by using the so-called Teiresias algorithm, 
extracting event sequence patterns from 
said first event sequence, said patterns 
constituting said model process, 

• storing said process-constituting patterns; 
and 

in an operation mode, extracting characteristic 
pattems from an actual process by performing 
the following steps: 

• building a second event sequence by filter- 
ing an event stream generated by said ac- 
tual process, 

• matching said second event sequence with 
said stored process-constituting patterns, 
and 

• indicating the result of said matching step. 

2. The method for intrusion detection according to 
claim 1, wherein 

• in the training mode, the first event sequence 
is translated and the rules used for said trans- 
lation are stored, 

• in the operation mode, the second event se- 
quence is translated using said stored transla- 
tion rules. 

3. The method for Intrusion detection according to 
claim 2. wherein the translation is a dynamic, on- 
the-fly translation. 

4. The method for Intrusion detection according to any 
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of the claims 1 to 3, wherein 

• in the training mode, the first event sequence 
Is compressed according to a given set of ag- 
gregation rules, 

• in the operation mode, the second event se- 
quence is compressed using said given set of 
aggregation rules. 

5. The method for intrusion detection according to one 
or more of the preceding claims, wherein 

• an event stream generated by either one or 
both of the processes is recorded and 

• said recorded event stream filtered to build the 
first and/or second event sequence. 

6. The method for intrusion detection according to one 
or more of the preceding claims, wherein the proc- 
ess-constituting patterns of the first event sequenc- 
es contain patterns of varying lengths, in particular 
patterns of maximal lengths. 

7. The method for intrusion detection according to one 
or more of the preceding claims, further Including a 
reduction step in the training mode, whereby any 
duplications in the obtained event sequence are re- 
moved. 

8. The method for intrusion detection according to one 
or more of the preceding claims, wherein the train- 
ing mode is carried out under the control of the in- 
trusion detection system. 

9. The method for intrusion detection according to one 
or more of the claims 2 to 7, wherein all or part of 
the method steps are applied in the following se- 
quences 

In the training mode: 

1 . event recording, 

2. process filtering, 

3. translation and storage of translation 
rules, 

4. reduction and aggregation, 

5. pattern extraction and storage; and 
in the operation mode: 

1 . event recording. 



2. process filtering, 

3. translation based on stored translation 
rules, 

5 

4. aggregation, 

5. pattem matching with stored pattems. 

10 10. An apparatus for detecting intrusion attempts in a 
computer or computer system, said apparatus com- 
prising in combination: 

• a first filtering component (107) for filtering, in 
15 a training mode branch, a first event stream 

generated by a model process (1 03) and build- 
ing a first event sequence (108), 

• a pattern extraction component (113) extracting 
20 event sequence pattems from said first event 

sequence by using the so-called Teiresias al- 
gorithm, said pattems constituting said model 
process. 

25 - a pattern table component (135) storing said 
extracted, process-constituting pattems defin- 
ing normal behavior of said model process, 

• a second filtering component (1 27) for building, 
30 In an operation mode branch, a second event 

sequence by filtering an event stream generat- 
ed by an actual process (123), 

• a pattem matching component (1 33) for match- 
35 Ing said second event sequence with said proc- 
ess-constituting pattems stored in said pattem 
table component (135), and 

• an indicator component (1 36) for indicating the 
40 output of said matching component (1 33). 

11. The apparatus for intrusion detection according to 
claim 10, further comprising 

45 • a first translation component (1 09) which, in the 
training mode branch, translates the first event 
sequence. 

• a translation table (1 34) for storing the transla- 
50 tion rules used for said translation. 

• a second translation component (129) which, 
in the operation mode branch, translates the 
second event sequence, using said translation 

55 rules stored in said translation table (134). 

12. The apparatus for intrusion detection according to 
any of the claims 1 0 and 1 1 , further comprising 
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• a first compression component (111) which, in 
the training mode branch, compresses the first 
event sequence according to a given set of ag- 
gregation rules, and 

• a second compression component (131) which, 
in the operation mode branch, compresses the 
second event sequence using said given set of 
aggregation rules. 

13. The apparatus for Intrusion detection according to 
claim 12, further comprising a reduction component 
for removing duplicates in the event sequence ob- 
tained in the training mode branch, in particular a 
reduction component combined with the first com- 
pression component (111). 

14. The apparatus for intrusion detection according to 
one or more of the claims 10 to 13, wherein ail or 
part of the components are arranged in the following 
working sequences 

in the training mode branch: 

1 . event recording component (1 05), 

2. process filtering component (107), 



3. translation component (109) and trans- 
lation table (134), 30 

4. reduction and aggregation component 
(111). 

5. pattern extraction component (1 1 3) and 35 
pattern table (135); and 



in the operational mode branch: 

1. event recording component (125), 

2. process filtering component (127). 



3. translation component (129), connected 
to said translation table (134), 

4. aggregation component (131), 

5. pattern matching component (133), con- 
nected to said pattern table (135). 
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Patentanspruche 

1. Verfahren zur Detektion von Eindringversuchen in 
einem Rechner oder einem Rechnersystem, wobei 
das Verfahren eine Kombination derfolgenden Mo- 
di und Schritte umfasst: 

einen Trainingsmodus, in dem eine Tabelle 
charakteristischer. den Prozess darstellender 
Muster, die das normale Verhalten eines Mo- 
dellprozesses in dem Rechner Oder dem Rech- 
nersystem definieren, durch Ausfuhren derfol- 
genden Schritte erstellt wird: 

• Erstellen einer ersten Ereignissequenz 
durch Filtem eines ersten durch den Mo- 
dellprozess generierten Ereignisdaten- 
stroms, 

• Venwenden des so genannten Teiresias- 
Algorithmus zum Extrahleren von Ereignis- 
sequenzmustem aus der ersten Ereignis- 
sequenz, wobei die Muster den Modellpro- 
zess darstellen, 

• Speichern der den Prozess darstellenden 
Muster; und 

einen Betriebsmodus, in dem aus einem realen 
Prozess charakteristische Muster durch Aus- 
fuhren der folgenden Schritte extrahiert wer- 
den: 

• Erstellen einer zweiten Ereignissequenz 
durch Filtem eines durch den realen Pro- 
zess generierten Ereignisdatenstroms, 

• Vergleichen der ersten Ereignissequenz 
mit den gespeicherten. den Prozess dar- 
stellenden Mustem und 

• Anzelgen des Ergebnisses des Vergleichs- 
schrittes. 



45 2. Verfahren zur Eindringdetektion nach Anspnjch 1 , 
wobei 

• die erste Ereignissequenz im Trainingsmodus 
umgesetzt wird und die bei der Umsetzung an- 
50 gewendeten Regein gespeichert werden. 



40 



15. The apparatus for intrusion detection according to 
one or more of the preceding claims, wherein all or 
part of the components are arranged such that, to 
avoid duplication, they can be used alternatively ei- ss 
ther in the training mode branch or the operation 
mode branch. 



• die zweite Ereignissequenz im Betriebsmodus 
unter Venwendung der gespeicherten Umset- 
zungsregeln umgesetzt wird. 

3. Verfahren zur Eindringdetektion nach Anspruch 2, 
wobei die Umsetzung eine dynamische Umsetzung 
ist und wahrend der Verarteitung stattfindet 
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4. Verfahren zur Eindringdetektion nach einem der 
AnsprOche 1 bis 3, wobei 

• die erste Ereignissequenz im Trainingsmodus 
nach einem vorgegebenen Satz von Kompri- s 
mierungsregein komprimiert wird, 

• die zweite Ereignissequenz im Betriebsmodus 
unter Verwendung des vorgegebenen Satzes 
von Komprimierungsregein komprimiert wird. 

5. Verfahren zur Eindringdetektion nach einem Oder 
mehreren der vorangehenden AnsprOche, wobei 

• ein durch einen Oder beide Prozesse generier- Js 
ter Ereignisdatenstrom aufgezeichnet und 

• der aufgezeichnete Ereignisdatenstrom gefil- 
tert wird, um die erste und/oder zweite Ereig- 
nissequenz zu erstellen. 

20 

6. Verfahren zur Eindringdetektion nach einem oder 
mehreren der vorangehenden AnsprOche, wobei 
die den Prozess darstellenden Muster der ersten 
Ereignlssequenzen Muster variabler LSnge, insbe- 
sondere Muster maximaler LSngen. enthalten. 25 

7. Verfahren zur Eindringdetektion nach einem oder 
mehreren der vorangehenden AnsprOche, welches 
im Trainingsmodus femer eInen Reduzlerungs- 
schritt enthait, durch den alle doppelten Muster aus 30 
der erhaltenen Ereignissequenz entfemt werden, 

8. Verfahren zur Eindringdetektion nach einem oder 
mehreren der vorangehenden AnsprOche. wobei 
der Trainingsmodus unter Kontrolle des Systems 35 
zur Eindringdetektion erfolgt. 

9. Verfahren zur Eindringdetektion nach einem oder 
mehreren der AnsprOche 2 bis 7, wobei alle oder 
einige der Verfahrensschritte in den folgenden Se- ^ 
quenzen ausgefOhrt werden, und zwar 

im Trainingsmodus: 

1 . Aufzeichnen des Ereignisses, ^ 

2. Filtem des Prozesses. 

3. Umsetzen und Speichem der Umset- 
zungsregeln. 

4. Reduzieren und Komprimieren, 

5. Extrahieren und Speichem der Muster; so 
und 

im Betriebsmodus: 

1 . Aufzeichnen des Ereignisses. 55 

2. Filtem des Prozesses. 

3. Umsetzen auf Basis der gespeicherten 
Umsetzungsregein, 



4. Komprimieren, 

5. Vergleichen der Muster mit den gespei- 
cherten Mustem. 

10. Vorrichtung zur Detektion von Eindrlngversuchen in 
einem Rechner oder einem Rechnersystem, wobei 
die Vorrichtung eine Kombination der folgenden 
Komponenten umfasst: 

• eine erste Filterkomponente (107) zum Filtem 
eines durch einen Modellprozess (1 03) gene- 
rierten ersten Ereignisdatenstroms in einem 
Zweig des Trainingsmodus und zum Erstellen 
einer ersten Ereignissequenz (108), 

• eine Musterextrahierungs-Komponente (113) 
zum Extrahieren von Ereignissequenzmustern 
aus der ersten Ereignissequenz unter Verwen- 
dung des so genannten Teiresias-Algorithmus, 
wobei die Muster den Modellprozess darstet- 
len, 

• eine Mustertabellen-Komponente (135) zum 
Speichem der den Prozess darstellenden ex- 
trahierten Muster, welche das normale Verhal- 
ten des Modellprozesses definieren. 

• eine zweite Filterkomponente (1 27) zum Erstel- 
len einer zweiten Ereignissequenz in einem 
Zweig des Betriebsmodus durch Filtem eines 
durch einen realen Prozess (123) generierten 
Ereignisdatenstroms, 

• eine Mustervergleichs-Komponente (1 33) zum 
Vergleichen der zweiten Ereignissequenz mit 
den in der Mustertabellen-Komponente (135) 
gespeicherten, den Prozess darstellenden Mu- 
ster, und 

• eine Anzeigekomponente (1 36) zum Anzeigen 
des Ausgabewertes der Vergleichskomponen- 
te(133). 

11. Vorrichtung zur Eindringdetektion nach Anspmch 
10, die femer Folgendes umfasst: 

• eine erste Umsetzungskomponente (109), die 
im Zweig des Trainingsmodus die erste Ereig- 
nissequenz umsetzt. 

• eine Umsetzungstabelle (134) zum Speichem 
der beim Umsetzen angewendeten Umset- 
zungsregein, 

• eine zweite Umsetzungskomponente (1 29), die 
im Zweig des Betriebsmodus die zweite Ereig- 
nisdatenfolge unter Anwendung der in der Um- 
setzungstabelle (134) gespeicherten Umset- 
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zungsregein umsetzt. 

12. Vorrichtung zur Eindringdetektion nach einem der 
Anspruche 10 und 11 , wobei die Vorrichtung femer 
Folgendes umfasst: ^ 

• eine erste Komprimierungskomponente (111), 
die im Zweig des Trainingsmodus die erste Er- 
eignlssequenz gemSR einem vorgegebenen 
Satz von Komprimierungsregeln komprimiert. io 
und 

• eine zweite Komprimierungskomponente 
(131), die im Zweig des Betriebsmodus die 
zweite Erelgnissequenz unter Anwendung ei- ^5 
nes vorgegebenen Satzes von Komprimie- 
rungsregeln komprimiert. 

13. Vorrichtung zur Eindringdetektion nach Anspruch 

12, wobei die Vorrichtung femer eine Reduzie- 20 
rungskomponente zum Entfemen doppelter Muster 
aus der Im Zweig des Trainingsmodus erhaltehen 
Ereignissequenz und insbesondere eine Reduzle- 
rungskomponente in Kombination mit der ersten 
Komprimierungskomponente (111) umfasst. 25 

14. Vorrichtung zur Eindringdetektion nach einem oder 
mehreren der Anspruche 1 0 bis 1 3. wobei alle oder 
einige der Komponenten in den folgenden Arbeits- 
sequenzen angeordnet sind: 

im Zweig des Trainingsmodus: 

1 . Ereignisaufzetchnungs-Komponente 
(105). ^ 

2. Prozessfilter-Komponente (107), 

3. Umsetzungskomponente (109) und Um- 
setzungstabeile (134), 

4. Reduzierungs- und Komprimierungs- 
komponente (111), ^ 

5. Musterextrahierungs-Komponente 
(113) und Mustertabeile (135); und 

im Zweig des Betriebsmodus: 

45 

1 . Ereignlsaufzeichnungs-Komponente 
(125), 

2. Prozessfilter-Komponente (127), 

3. mit der Umsetzungstabelle (134) ver- 
bundene Umsetzungskomponente (1 29) so 

4. Komprimierungskomponente (131), 

5. mit der Mustertabeile (135) verbundene 
Mustervergleichs-Komponente (133). 

1 5. Vorrichtung zur Eindringdetektion nach einem oder ss 
mehreren der vorangehenden Anspruche, wobei 

die Komponenten in der Vorrichtung so angeordnet 
sind, dass diese zur Vermeidung von Duplikaten al- 



temativ entweder im Zweig des Trainingsmodus 
Oder im Zweig des Betriebsmodus eingesetzt wer- 
den kfinnen. 

Revendications 

1. Un proc§d§ de detection des tentatives d'intnjsion 
dans un ordinateurou un systfeme d'ordinateur, ledit 
proc6d§ comprenant, en combinaison : 

len un mode d'apprentissage, la construction 
d'une table de caract6ristiques, en constituant, 
par un processus, des motifs d^finissant le 
comportement normal d'un processus de mo- 
d6lisation dans I'ordinateur ou ledit syst^me 
d'ordinateur, en executant les 6tapes 
suivantes : 

construction d'une premiere sequence 
d'6v6nements. par filtrage d'un premier 
flux d'6v6nements g6n6r6 par ledit proces- 
sus de mod^lisation. 

par utilisation ce que Ton appelle I'algorith- 
me de Teiresias, extraction des motifs de 
sequence d'§v§nements de ladite premie- 
re sequence d'6v6nements. lesdits motifs 
constituant ledit processus de mod6lisa- 
tion, 

stockage desdits motifs constituant le 
processus ; et 

2en un mode op6rationnel, extraire des motifs 
de caract6ristique d'un processus r§el. en exe- 
cutant les etapes suivantes : 

construction d'une deuxi§me sequence 
d'6v6nements par filtrage d'un flux d'ev§- 
nements g§n6r6 par ledit processus r^el, 

mise en coincidence de ladite deuxidme 
sequence d'6v§nements avec lesdits mo- 
tifs constituant le processus, ayant ete 
stock6s, et 

indiquer le r^sultat de ladite §tape de mise 
en coincidence. 

2. Le proc6d§ de detection d'intrusion selon la reven- 
dication 1 , dans lequel : 

en mode d'apprentissage, la premifere sequen- 
ce d'evenements est traduite et les regies utili- 
sees pour ladite traduction sont stock^es. 

en mode op6rationnel, la deuxi§me sequence 
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. d'6v§nements est traduite en utilisant lesdites 
regies de traduction stock^es. 

3. Le proc4d§ de detection d'intnjsion selon la reven- 
dication 2, dans lequet ia traduction est une traduc- 
tion dynamique, au vol. 

4. Le proc§d6 de detection d'intrusion selon Tune 
quelconque des revendications 1 k 3, dans lequel : 

en mode d'apprentlssage, la premiere sequen- 
ce d'^v^nements est comprim^e selon un jeu 
donn6 de r§gles d'agr^gation, 

en mode fonctionnel, la deuxidme sequence 
d'^v^nements est comprim^e en utilisant ledit 
jeu donn6 de regies d'agr6gation. 

5. Le proc6d6 de detection d'intrusion selon I'une ou 
plusleurs des revendications pr6c6dentes, dans 
lequel : 

un flux d'§v6nements, g6n6r6 par I'un ou les 
deux processus, est enregistr§ et 

ledit flux d'^v^nements enregistr6 est f iltr§ pour 
construire la premiere et/ou la deuxidme se- 
quence d'§v6nements. 

6. Le precede de detection d'intrusion selon Tune ou 
plusieurs des revendications precedentes, dans le- 
quel les motifs constituant le processus des premie- 
res sequences d'evenements contiennent des mo- 
tifs de differentes longueurs, en particulier des mo- 
tifs de longueurs maximales. 

7. Le precede de detection d'intrusion selon I'une ou 
plusieurs des revendications precedentes. compre- 
nant en outre une etape de reduction en mode d'ap- 
prentlssage, de maniere k supprimer d'eventuelles 
duplications dans la sequence d'evenements obte- 
nue. 

8. Le precede de detection d'intrusion selon I'une ou 
plusieurs des revendications precedentes, dans le- 
quel le mode d'apprentlssage est effectue sous la 
commande du systeme de detection d'intrusion. 

9. Le precede de detection d'intrusion selon I'une ou 
plusieurs des revendications 2^7, dans lequel la 
totalite ou une partie des etapes du precede sont 
appliquees dans les ordres de succession 
suivants : 

Sen mode d'apprentissage : 

4 enregistrement d'evenement, 



- 5 filtrage de processus 

6 traduction et stockage des regies de tra- 
duction, 

5 

- 7 reduction et agregation, 

8 extraction et stockage de motif ; et 
10 - 9en mode operationnel : 

01 enregistrement d'evenement, 

02 filtrage de processus, 

15 

03 traduction basee sur des regies de tran- 
saction stockees, 

04 agregation 

20 

05 mise en coincidence des motifs avec 
des motifs stockes. 

10. Le procede de detection des tentatives d'intrusion 
25 dans un ordinateur ou un systeme d'ordinateur, ledit 
dispositif comprenant, en combinaison : 

un premier composant de filtrage (1 07) pour fil- 
trer, en une branche de mode d'apprentissage, 
30 un flux d'evenements genere par un processus 

de modeiisation (103), et constnjction d'une 
premiere sequence d'evenements (108), 

un composant d'extraction de motif (113), ex- 
35 trayant des motifs de sequence d'evenements 

de ladite premiere sequence d'evenements, 
par utilisation de ce que i'on appelle i'algorith- 
me de Teiresias. lesdits motifs constituant ledit 
processus de modeiisation. 

40 

un composant de table de motif (1 35). stockant 
lesdits motifs extraits. constituant les proces- 
sus, definissant le comportement normal dudit 
processus de modeiisation, 

45 

un deuxieme composant de filtrage (127), pour 
construire, dans une branche de mode opera- 
tionnel, une deuxieme sequence d'evene- 
ments par filtrage d'un flux d'evenements ge- 
50 nere par un processus reel (123). 

un composant de mise en coincidence de mo- 
tifs (133), pour mettre en coincidence, ladite 
deuxieme sequence d'evenements avec les- 
55 dits motifs de constitution de processus, stoc- 

kes dans ledit composant de table de motifs 
(135), et 
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un composant indicateur (136), pour indiquer 
le r§su!tat dudit composant de mise en coTnci- 
dence (133). 

11. Le dispositif de detection des tentattves d'intrusion 
selon la revendication 10, comprenant en outre : 

un premier composant de traduction (109) qui. 
dans la branche de mode d'apprentissage, tra- 
duit la premiere sequence d'§v6nements, 

. une table de traduction (134), pour stocker les 
regies de transaction utilis^es pour ladite tra- 
duction, 

, un deuxifeme composant de traduction (129), 
qui, dans la branche de mode op^rationnel, tra- 
duit la deuxi6me sequence d'6v6nements en 
utilisant lesdites regies de transaction stock6es 
dans ladite table de transaction (134). 

12. Le dispositif de detection des tentatives d'intrusion 
selon Tune quelconque des revendications 1 0 et 1 1 , 
comprenant en outre : 

un premier composant de compression (111) 
qui, dans la branche de mode d'apprentissage, 
comprlme la premiere sequence d'6v6nements 
selon un jeu donn6 de regies d'agregation, et 

un deuxi^me composant de compression (131) 
qui. ^ la branche de mode op6rationnel, corn- 
prime la deuxi^me sequence d'6v6nements, 
par utilisation dudit jeu donn6 de regies d'agre- 
gation. 

13. Le dispositif de detection d'intrusion selon la reven- 
dication 12, comprenant en outre une composante 
de reduction, pour supprimer les doubles dans la 
sequence d'6v6nements obtenue dans la branche 
de mode d'apprentissage, en particulier un compo- 
sant de reduction combing au premier composant 
de compression (111). 

14. Le dispositif de detection d'intrusion selon Tune ou 
plusieurs des revendications 10 13. dans lequel 
la totality ou une partie des composants sont agen- 
c§s dans les sequences de travail suivantes : 

1 0en branche de mode d'apprentissage : 



(111). 

1 5composant d'extraction de motif (1 1 3) et 
table de motif (135);et 

5 . 1 6dans la branche de mode op6rationnel : 

1. composant d'enregistrement d'6v6ne- 
ment(125). 

70 - 1 7composant de f iltrage de processus (1 27), 

- 1 Scomposant de traduction (129), reli6 k ladite 
table de traduction (134), 

IS - 1 Gcomposant d'agregation (1 31 ). 

20composant de coincidence de motifs (133) 
relie k ladite table de motif (135). 

20 15. Le dispositif de detection d'intrusion selon I'une ou 
plusieurs des revendications pr6c§dentes. dans le- 
quel la totality ou une partie des composants sont 
agenc6s de mani^re que, pour ^vlter toute duplica- 
tion, ils puissent §tre utilises de fafon altem^e, soit 

25 dans la branche de mode d'apprentissage, soit 
dans la branche de mode op6rationnel. 



11 composant d'enregistrement d'^v^ne- 
ment(105), 

12composant de f Iltrage de processus 
(107), 

13composant de uaduction (109) et table 
de traduction (134), 

1 4composant de reduction et d'agregation 
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Process 


Event 


Process Id 


• • • 

ftpd 


FILE__Close 


16415 


Is 


PROC_Execute 


16415 


Is 


FILE_Close 


16415 


fmgerd 


PROC_Execute 


18210 


Is 


PROC_Delete 


16415 


fmgerd 


PROC_SetSignal 


18210 


ftpd 


PROC_Create 


18303 


ftpd 


FILE_Close 


18303 


ftpd 


FILE_Close 


18303 


ftpd 


FILE_Close 


18303 


fmgerd 


FILE_Read 


18210 


fmgerd 


PROC_Create 


18210 


ftpd 


FILE_Close 


18303 


ftpd 


PROC SetSignal 


18303 


ftpd 


FILE_Read 


18303 


ftpd 


FILE Read 


18303 


ftpd 


FILE_Write 


18303 


ftpd 


PROC_Delete 


18303 


fmgerd 


PROC_Execute 


19415 


fmgerd 


PROC SetSignal 


19415 


fmgerd 


FILE_Read 


19415 


fmgerd 


PROC_Create 


19415 




Fig. 2 
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0: (ftpd, FILE__Close), (Is, PROC_Execute), 
(Is, FILE_Close), (Is, PROC_Delete) 

I: (fmgerd, PROC_Execute), 

(fingerd, PROC_SetSignal), (fingerd, FILE_Read), 
(fingerd, PROC_Create) 

2: (ftpd, PROC^Create), (ftpd, FELE^Close), 
(ftpd, FILE_Close), (ftpd, FILE_Close), 
(ftpd, FILE_Close), (ftpd, PROC_SetSignal), 
(ftpd, FILE_Read), (ftpd, FILE_Read), 
(ftpd, FILE_Write), (ftpd, PROC_Delete) 

3: (fmgerd, PROC_Execute), 
(fingerd, PROC_SetSignal), 
(fingerd, FILE_Read), (fingerd, PROC_Create) 

Fig, 3 



0: ABCD 
1:EFGH 

2:IAAAAJKKLM 
3:EFGH 

Fig. 4 
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0: ABCD 
1:EFGH 
2:IAJKLM 

Fig. 5 



a) 0:ABCDEA b) 43 BC 

liBCFDEABCD 43 DE 

2:BCEADEFDE 33 EA 

22 ABCD 

22 DEA 

22 FDE 



Fig. 6 
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Strings 



Pattern set 



Reduced 
pattern set 



1) ABCDEA 

BCFDEABCD 
BCEADEFDE 



BC 4 

DE 2 

EA 2 

ABCD 8 

DE A 3 

FDE 3 



2) EA 

BCFDE 
BCEADEFDE 



BC 

DE 

EA 

DEA 

FDE 



4 
4 
2 
0 
6 



ABCD 



3) EA 
BC 

BCEADE 



BC 
DE 
EA 
DEA 



4 
2 
2 
0 



ABCD 
FDE 



4) EA 
EADE 



DE 2 
E A 4 
DEA 0 



ABCD 

FDE 

BC 



5) DE 



DE 2 
DEA 0 



ABCD 
FDE 
BD 
EA 



6) 



DEA 



Fig. 7 
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ABCD XYZD ABC GHI XYZ HF 
ABCABCDXYZGHI ABC Selectable 
ABCD XYZD ABC 
ABCDXYZGHI 
ABCD 

ABCD XYZD i\DC Gin XYZ 

(jHt?: ' ■ 

ABCD XYZD ABC GHI ABC Validated 
Fig. 8 



Table: ABCD XYZD ABC GHI XYZ HF 

String: ABCABCDKMWHF ABC Selectable 

ABCD XYZD ABC 
Remainder of string: ABCDKMWHF J 
Possible combinat: ABC/ABC -> Length 6 

ABC/ABCD -> Length 7 -> Selected 

K — > Shifted u-1 
M — > Shifted u = 2 
W — > Shifted u = 3 
HF u=0 
ABCD XYZD i\BC GUI XYZ HF 

Fig. 9 



Table: 
String: 

Remainder of string: 
Remainder of string: 
Remainder of string: 
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